Whole-genome sequencing identified novel mutations in a Chinese family with lynch syndrome

Background Lynch syndrome (LS) is caused by a germline mutation in one of the mismatch repair genes (MLH1, MSH2, MSH6, and PMS2) or in the EPCAM gene. The definition of Lynch syndrome is based on clinical, pathological, and genetic findings. Therefore, the identification of susceptibility genes is essential for accurate risk assessment and tailored screening programs in LS monitoring. Patients and methods In this study, LS was diagnosed clinically in a Chinese family using Amsterdam II criteria. To further explore the molecular characteristics of this LS family, we performed whole genome sequencing (WGS) to 16 members in this family and summarized the unique mutational profiles within this family. We also used Sanger sequencing technology and immunohistochemistry (IHC) to verify some of the mutations identified in the WGS analysis. Results We showed that mutations in mismatch repair (MMR) related genes, as well as pathways including DNA replication, base excision repair, nucleotide excision repair, and homologous recombination were enhanced in this family. Two specific variants, MSH2 (p.S860X) and FSHR (p.I265V) were identified in all five members with LS phenotypes in this family. The MSH2 (p.S860X) variant is the first reported variant in a Chinese LS family. This mutation would result in a truncated protein. Theoretically, these patients might benefit from PD-1 (Programmed death 1) immune checkpoint blockade therapy. The patients who received nivolumab in combination with docetaxel treatments are currently in good health. Conclusion Our findings extend the mutation spectrum of genes associated with LS in MLH2 and FSHR, which is essential for future screening and genetic diagnosis of LS.


Introduction
Lynch syndrome (LS) is an autosomal dominant syndrome linked to a variety of cancers of the colon, endometrium, ovary, and others (1,2). LS is mainly caused by germline and epistatic mutations in the human mismatch repair (MMR) gene (3). Maintaining genomic stability is a key function of the MMR protein (4). During DNA replication, repair, and recombination, the MMR system monitors and corrects errors (5). Several factors contribute to MMR, including MutS homolog 2 (MSH2), MutL homolog 1 (MLH1), MutS homolog 6 (MSH6), post-meiotic segregation increased 2 (PMS2), and epithelial cell adhesion molecule (EPCAM). Clinical, pathological, and genetic findings are used to diagnose LS (6). For clinical monitoring of carriers and genetic testing of relatives at high risk, it is therefore important to detect LS-related mutations (7).
LS is typically diagnosed clinically based on Amsterdam or Bethesda criteria (3). Patients with LS are typically screened for mutations in the MMR pathway using genetic testing (8). MLH1 and MSH2 mutations are most prevalent in LS (90%), MSH6 (10%), and PMS2 mutations are less frequency (9,10).
This study aims to elucidate which variants of the MMR gene could provide a more accurate risk assessment or predictive test for existing 'healthy' members of affected families. Our study investigated LS due to mutations in a family using whole genome sequencing (WGS) and Sanger sequencing.

Patient and ethical statements
In Shenzhen People's Hospital's Department of Medical Oncology, a four-generation Chinese family was diagnosed and treated for LS. According to the Amsterdam II criteria, clinical testing reports, and detailed family history, oncologists made the clinical diagnosis of LS. Informed consent was obtained from all four generations of Chinese family members participating in this study. In accordance with the Declaration of Helsinki, the Ethics Committee of Shenzhen People's Hospital reviewed and approved our study.

DNA extraction
A QIAamp DNA Blood Midi Kit (Qiagen; Valencia, CA, USA) was used to extract DNA from participants' peripheral blood for whole-genome sequencing and Sanger sequencing.

Whole genome sequencing
Covaris-focused ultrasound (Covaris, MA, USA) was used to shear DNA and 6 cycles of PCR were used to enrich for fragments of DNA. Agilent 2100 Bioanalyzer was used to analyze the size distribution of the library. A 150bp paired-end read was generated from raw DNA libraries using Illumina Hiseq.

Data processing
Trimmomatic (version 0.36) was used to discard raw reads contaminated with adapters and low-quality/unidentified nucleotides. Quality-controlled reads were compared to the UCSC (University of California, Santa Cruz) human reference genome (GRCh37) using BWA software (11). PCR duplicates were removed and bam files were indexed using Samtools and Picard. In order to generate the final BAM (the binary version of a SAM file) file, GATK (The Genome Analysis Toolkit) was used to recalibrate the base quality (12). GATK was used to identify single nucleotide variants (SNVs) in the germline, and ANNOVAR (ANNOtate VARiation) was used to annotate and prioritize those variants (13). SIFT, PolyPhen2, and MutationTaster were used to assess the pathogenicity of missense variants. All variants identified in this study were manually checked using Integrative Genomics Viewer (IGV version 2.3.86) and only variants in the coding and splice regions were considered (14). Copy number variants (CNVs) and structural variants were detected using Control-FREEC and Breakdancer, respectively (15,16).

Sanger sequencing
Sanger sequencing was used to validate the candidate variants identified above.

Gene ontology biological process enrichment analysis
Gene Ontology (GO) enrichment analysis was performed using R "clusterProfiler" package using genes with >=20 mutations as input. A p.adj of 0.05 was used as cutoff for statistical significance.

Statistical analysis
Statistical analysis was performed with R (version 3.6.3) (18)(19)(20). Results expressed as mean ± SD (Standard Deviations) were analyzed using the Student's t.test. Differences were considered significant when P < 0.05.

An LS pedigree of four generations
A 53-year-old female with a personal and family history matching Lynch syndrome phenotype was involved in this study as the proband (S99 in Figure 1A). At the age of 49, this patient was diagnosed with endometrial sarcoma (Supplementary Figures 1A-C), ovarian cancer (Supplementary Figures 1D-F), and colorectal cancer (Supplementary Figure 1G, H). We further investigated the proband's 24 relatives within four generations, as illustrated in Figure 1A. Among these 24 relatives, 1 female ancestor in generation 1 was diagnosed with colorectal cancer; 4 out of 5 ancestors from generation 2 were diagnosed with colorectal cancer or pancreatic cancer; 5 out of 13 participants in generation 3 were diagnosed with one or more than one types of the following diseases: patient 1 (S60) with colorectal polyps; patient 2 (S66) with colorectal polyps; patient 6 (S63) with colorectal cancer, tubular adenocarcinoma, and endometrial cancer; patient 13 (S99) with colorectal cancer, ovarian cancer, and endometrial sarcoma; patient 15 (S102) with colorectal cancer and ovarian cancer. We further performed whole genome sequencing to the peripheral blood samples of 16 members from this family (as shown in Figure 1B and listed in Table 1) and their SNP (single nucleotide polymorphism) and InDel (Insertion and deletion) distribution profiles were shown in Supplementary Figure 2.

The mutational profiles of members from this LS family
To compare the mutational profiles of LS family members with normal profiles, in this study, we involved mutational profiles from 1000geneomes (www.internationalgenome.org) as background control. The mutation frequencies of EPCAM, MSH2, and PMS2 genes were significantly higher in LS family members compared to those in 1000genome profiles, and the mutation frequency of MLH1 gene was significantly lower in LS family members compared to it in 1000genome profiles ( Figure 2A). Regarding MMR related pathways, pathways involved in DNA replication, base excision repairs, nucleotide excision repair, and homologous recombination had significantly higher numbers of mutations in LS family members compared to these in 1000genome profiles, as shown in Figure 2B.
The key gene mutations across the family were listed in Table 2. The top 28 genes with high mutation burdens (over 20) were listed in Figure 3. The top enriched GO BP (biological process) terms of these genes were listed in Table 3.

Unique mutational features of 5 members with LS phenotype
To further explore the unique mutational profiles of members with LS phenotype, we divided these 16 members into two groups, 5 members with LS phenotype (LSD including S60: CP; S66: CP; S63: CC+TA+EC; S99:CC+OC+ES; S102:CC+OC) and 11 members without LS phenotype (LSN including S28, S100, S35, S14, S69, S27, S16, S8, S63, S59, S39, S28), as illustrated in Supplementary Figure 3. Regarding SNP mutations, the number of different types of mutations shared in the different numbers of LSD members was listed in Figure 4A. Specifically, exonic SNP mutations were enriched in LSD members compared to these in LSN members (red); regarding InDel mutations, the number of different types of mutations shared in the different numbers of LSD members was listed in Figure 4B. Specifically, intronic mutations were enriched in LSD members compared to these in LSN members (red).
Generally, the level of CNVs (Copy Number Variations) in LSD members was significantly higher than this in LSN members (p < 0.05), as shown in Supplementary Figure 4. Regarding KEGG (Kyoto Encyclopedia of Genes and Genomes) enrichment analysis, pathways relating to cellular senescence, hippo signaling pathway, NOD (Nucleotide oligomerization domain)-like receptor signaling pathway, and PD-L1 (Programmed death ligand-1) expression/PD-1(Programmed death-1) checkpoint pathway in cancer had more SNP/InDel mutations in LSD members compared to these in LSN members ( Figure 5).
We also summarized missense/stopgain/frameshift mutations that were shared in more than 3 LSD members, as illustrated in  S R M S . p . A 4 5 3 T , S R M S . p . P 2 1 8 L , R T N 4 . p . D 1 5 1 V a n d ASAH2B.p.S2288C) that were shared in 4 LSD members, 2 mutations (PDE4DIP.p.S2288C, BCR.p.S1048fs) that were shared in 3 LSD members.

The validation of the mutations
To validate some of these variants, we performed sanger sequencing to samples of patient 13 (S99, the proband), and the result was shown in Figure 7. MSH2 (p.S860X) mutation was verified in tumor samples from patient 13. We also found higher expression of MMR related proteins including MLH1, MSH2, MSH6, and PMS2 in the tumor tissues of patient 13 compared to these in the paraCancer tissues, as shown in Figure 8.

Therapeutic actions involved in the treatment of LSD members in this family
The proband (patient 13, S99) was diagnosed with ovarian cancer and endometrial sarcoma at age 49 and had radical surgery afterward. The colorectal cancer was accidentally found when scanned at baseline radiological evaluation before delivering adjuvant treatment. According to the Amsterdam II criteria, we examined her gene mutation profiles and found the MSH2 mutation in her peripheral blood samples. After that, this patient received six cycles of nivolumab in combination with docetaxel and cisplatin plus fluorouracil followed by 8 cycles of single-agent nivolumab. The patient is currently in good condition.
Two female patients with colorectal cancer as well as germline MSH2 mutations received endoscopic submucosal dissection. Among them, one female patient was subsequentially diagnosed with colorectal cancer, tubular adenocarcinoma, and endometrial cancer (patient 6, S63), and these tumors were all resected radically; another female patient, who is the sister of the proband, was diagnosed with colorectal cancer (resected in 2003) and ovarian cancer (resected in 2018). There was a recurrence of colorectal cancer in patient 6 in early 2022, during which this patient received 8 cycles of pembrolizumab as chemotherapy treatment. Genes with high mutational burdens in this LS family members.    Comparison of mutation numbers in genes relating to KEGG pathways between LSD members and LSN members. Key mutations in 5 LSD patients. The red rectangle represents that the specific mutation is present in the individual sample. Sanger sequencing confirms mutation in MSH2 (p.S860X) in proband patient.

Discussion
The Lynch syndrome (LS) is an autosomal dominant disorder linked to a high risk of cancer, especially colorectal cancer (21). LS is difficult to diagnose due to the following reasons: the diagnosis of LS is mainly based on clinical criteria; currently, it is hard to obtain family information; large-scope clinical phenotype information such as polyposis information is not available (3). As a result, the rate of LS diagnosis is far behind the actual incidence (9). To achieve a better therapeutic effect, as well as a good prognosis, early diagnosis is essential. Large-scale screening programs might be more beneficial for those who carry the causative mutations, while not so necessary for those who do not carry them. The identification of mutations that cause LS in LS families is useful for genetic counseling and disease management.
MSH2 was first mapped to 2p21 in 1993, and several deleterious mutations within this gene were identified in LS families (22). Subsequently, many mutations relating to MMR genes (MLH1, MSH2, MSH6, and PMS2) were also identified in the LS family.  (25). Follicle-stimulating hormone receptor (FSHR), expressed in vascular endothelial cells of different malignancies, has recently been investigated as a potential pan-receptor for cancer therapy (26)(27)(28). A missense mutation (p.I265V) leading to an amino acid switch from isoleucine (amino acid with hydrophobic side chain) to similar valine (amino acid with hydrophobic side chain) might not cause a dramatic structural change in FSHR protein. So far, the relationship between FSHR and LS has not been reported, and it might need further effort in exploring their connections.
In this study, we performed WGS on 16 members of this LS family. First, we tried to answer why members of this family tended to have LS. We found that mutational levels relating to MMR pathways were enhanced, and mutational levels in pathways such as DNA DNA replication, base excision repair, nucleotide excision repair and homologous recombination were also enhanced. Second, we tried to answer why these 5 LSD members had LS phenotype instead of the other 11 LSN members. Two mutations (MSH2.p.S860X and FSHR.p.I265V) were shared among all these 5 LSD members other than the 11 LSN members. Based on the HNPCC mutation database, the germline mutation MSH2 (p.S860X) was reported to be found in the investigated HNPCC patients (29), which was the first report of a germline variant of MSH2 (p.S860X) in a Chinese population. Sanger sequencing confirmed that this predisposed individual carried MSH2 (p.S860X).
We hypothesize that the MSH2 mutation (p.S860X) is the primary cause of Lynch syndrome in this family and plays a significant role in its onset. We speculate that the other highfrequency mutations found may not play a major role in the development of LS. Further validation is needed for the role played by some variants in tumor-associated genes in members of this family.

Conclusion
In conclusion, our study provides a preliminary exploration of LS pathogenesis from the perspective of a complete LS family pedigree. Our results suggest key mutations including MSH2 (p.S860X) and FSHR (p.I265V), as well as increased mutations in MMR-related pathways could also contribute to the incidence of LS. The data presented in the study are deposited in GSA Human database (https:// ngdc.cncb.ac.cn/gsa-human/), accession number HRA003905.

Data availability statement
The data presented in the study are deposited in GSA Human database (https://ngdc.cncb.ac.cn/gsa-human/), accession number HRA003905.

Ethics statement
The studies involving human participants were reviewed and approved by the ethics committee of Shenzhen People's Hospital. The patients/participants provided their written informed consent to participate in this study.

Author contributions
WH and CZ conceived the research idea. WH and NT prepared and wrote the manuscript. SD, DL and DW performed data analysis. WH, JS, JW and PZ collected the clinical samples. JW, NT and CZ revised the manuscript. All authors contributed to the article have approved the submitted version.